Partial Generalized Additive Models: An Information-Theoretic Approach for Dealing With Concurvity and Selecting Variables
نویسندگان
چکیده
Scientists are often interested in which covariates are important, and how these covariates affect the response variable, rather than just making predictions. This requires inputs from both statistical modeling and background knowledge. Generalized additive models (GAMs) are a class of interpretable, multivariate nonparametric regression models which are very useful data exploration tools for these purposes, but concurvity among covariates (the nonlinear analogue of collinearity for linear regression) can lead GAMs to produce unstable or even wrong estimates of the covariates’ functional effects. We develop a new procedure called partial generalized additive models (pGAM), based on mutual information (MI), a measure of nonlinear dependence between variables. Our procedure is similar in spirit to the Gram–Schmidt method for linear least squares. By building a GAM on a selected set of transformed variables, pGAM produces more stable models, selects variables parsimoniously, and provides insight into the nature of concurvity between the covariates by calculating functional dependencies among them. With simulation experiments and real-data examples, we show that pGAM produces much better estimates of the covariates’ functional effects, and also incorporates a reasonable and meaningful variable selection method. R code for fitting pGAMs is available online (see Supplemental Materials Section).
منابع مشابه
THEORY AND METHODS A bootstrap method to avoid the effect of concurvity in generalised additive models in time series studies of air pollution
Background: In recent years a great number of studies have applied generalised additive models (GAMs) to time series data to estimate the short term health effects of air pollution. Lately, however, it has been found that concurvity—the non-parametric analogue of multicollinearity—might lead to underestimation of standard errors of the effects of independent variables. Underestimation of standa...
متن کاملA bootstrap method to avoid the effect of concurvity in generalised additive models in time series studies of air pollution.
BACKGROUND In recent years a great number of studies have applied generalised additive models (GAMs) to time series data to estimate the short term health effects of air pollution. Lately, however, it has been found that concurvity--the non-parametric analogue of multicollinearity--might lead to underestimation of standard errors of the effects of independent variables. Underestimation of stand...
متن کاملبهکارگیری مدل جمعیتعمیمیافته در تعیین نوع ارتباط عوامل خطر رتینوپاتی در بیماران دیابتی شهر تهران
Background : One of the most important complications of diabetes, is diabetic retinopathy that causes the blindness of 10,000 people every year. Different researches have been done on retinopathy risk factors in diabetic patients. This study was carried out to check the type of relationship between retinopathy risk factors and the condition of temptation it with generalized additive models. T...
متن کاملPredicting the geographical distribution of Alopecurus textilis Boiss rangeland species on basis Consensus approach of climate change in Mazandaran province
The climate changes have an important role in distribution of plant species. Statistical species distribution models (SDMs) are widely used to predict the changes in species distribution under climate change scenarios. In the peresent study, the distribution of Alopecurus textilis in the current and future climate condition (2050) under the influence of climate change and two scenarios of RCP 4...
متن کاملExploring bias in a generalized additive model for spatial air pollution data.
During the past few years, the generalized additive model (GAM) has become a standard tool for epidemiologic analysis exploring the effect of air pollution on population health. Recently, the use of the GAM has been extended from time-series data to spatial data. Still more recently, it has been suggested that the use of GAMs to analyze time-series data results in air pollution risk estimates b...
متن کامل